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Abstract 

In this paper, we propose a novel information theoretic model to interpret the entire “transmission 
chain” comprising stimulus generation, brain processing by the human subject, and the electroencephalo¬ 
graph (EEG) response measurements as a nonlinear, time-varying communication channel with memory. 
We use mutual information (MI) as a measure to assess audio quality perception by directly measuring 
the brainwave responses of the human subjects using a high resolution EEG. Our focus here is on 
audio where the quality is impaired by time varying distortions. In particular, we conduct experiments 
where subjects are presented with audio whose quality varies with time between different possible 
quality levels. The recorded EEG measurements can be modeled as a multidimensional Gaussian mixture 
model (GMM). In order to make the computation of the MI feasible, we present a novel low-complexity 
approximation technique for the differential entropy of the multidimensional GMM. We find the proposed 
information theoretic approach to be successful in quantifying subjective audio quality perception, with 
the results being consistent across different music sequences and distortion types. 

Index Terms 

Mutual information, perception, audio quality, electroencephalography (EEG), Gaussian mixture 
model (GMM) 
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I. Introduction 


An often used metaphor for the human brain is that of an information proeessing device. Our 
perception of the world around is received as input to the brain via sensory nerves. Detecting, 
interpreting, and responding to a stimulus is a multistage process that results in the activation 
and hierarchical interaction of several different regions in the brain. In these hierarchies, sensory 
and motor information is represented and manipulated in the form of neural activity, where 
neurons transfer information about stimulus features by successively transmitting impulse trains 
of electrical or chemical signals. Information in the brain is therefore inherently encoded as 
sequences of neural activity patterns. The superposition of these activity patterns can be non- 
invasively recorded via electroencephalography (EEG) by placing sensors on the scalp. In the 
following we provide a novel information theoretic model to analyze the overall transmission 
chain comprising of input stimulus generation, processing by the human brain, and the output 
EEG brain activity, as a time-varying, nonlinear communication channel with memory. 

Information theory (IT) provides a stochastic framework which is also well suited to char¬ 
acterize and model neural response Q-[|7|. In fact, soon after Shannon’s initial work Q, the 
concept of information was applied to calculate the capacity and bounds of neural information 
transfer [|9|-pT|. In particular, entropy and mutual information (MI) [12| can be used to quan¬ 
tify how much information a neural response contains about a stimulus Q, |13|. In [14| the 
author discusses local expansions for the nonparametric estimation of entropy and MI useful 
for neural data analysis. Eurther, p3| |, present a power-series-based component expansion 
for calculating the MI in order to analyze information conveyed by the spike timing of neuron 
firing in the somatosensory cortex of a rat. In pT| | different approximations of Gaussian mixture 
models are used to bound the MI transmission from stimulus to response in the cereal sensory 
system of a cricket. 

In contrast to these works which assess information transfer on the neural level, we focus 
on the “high-level” information flow between stimulus and EEG sensor observations. In such a 
setup MI has also has been successfully employed in the past for EEG feature extraction and 
classification purposes 118 |-pT|. Notably in p^ , MI is used to describe information transmission 
among different cortical areas during waking and sleep states using EEG measurements. Other 
researchers have similarly used MI to analyze EEG data to investigate corticocortical information 


transmission for pathological conditions such as Alzheimer’s disease |23| and schizophrenia [24|, 
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for odor stimulation [ [25| , [ |26| |, and even for motor disorders such as Parkinson’s disease [|27|. 
For subjective quality testing of audio the current state-of-the-art approach is Multi Stimulus 


with Hidden Anchor (MUSHRA) p8[ |. One characteristic that MUSHRA and most of the other 
existing audio and video testing protocols have in common is that each human participant assigns 
a single quality-rating score to each test sequence. Such testing suffers from a subject-based 
bias towards cultural factors in the local testing environment and tends to be highly variable. 
In contrast, by using EEG to directly analyze the brainwave patterns we capture fast, early 
brain responses that depend only on the perceived variation in signal quality. Because of these 
reasons there has recently been a growing interest in using EEGs to classify human perception 
of audio p9| , [ [30[ and visual quality pT]|-p4|. Eor example, p0| investigates the use of a 
time-space-frequency analysis to identify features in EEG brainwave responses corresponding 
to time-varying audio quality. Eurther, p9| , [311 propose to use linear discriminant analysis 
(EDA) classifiers to extract features based on the P300 ERP component [[35|, [|3^ for classifying 


noise detection in audio signals and to assess changes in perceptual video quality, respectively. 


Similarly, in the authors employ a wavelet-based approach for an EEG-classification of 
commonly occurring artifacts in compressed video, using a single-trial EEG. 

Our approach here is different compared to above-mentioned studies in that we use MI to 
quantitatively measure how accurately the quality of the audio stimulus is transmitted over 
the brain response channel, where the observation at the channel output is given by the EEG 
measurements. In other words, we ask how useful EEG measurements are to assess subjective 
audio quality. How well can the audio quality be blindly estimated (i.e., without knowing the 
original audio quality) from the EEG measurements? This could be useful for example for 
designing custom hearing aids optimized based on perceived audio quality or for a subjective 
assessment of audio streams which slowly vary their quality with time. Eor the sake of simplicity 
and analytical tractability we restrict ourselves to the worst case scenario of only two audio 
quality levels (high quality and bad quality). 

We show that the EEG measurements can be modeled as a multidimensional Gaussian mixture 
model (GMM). A direct computation of the MI from the EEG measurements is not feasible 
due to the high dimensionality of the data captured by using a large number of EEG sensors. 
Instead, we present a low-complexity approximation technique for the differential entropy of the 
multidimensional GMM based on a Taylor series expansion, which is essential for carrying out 
the MI computation problem in acceptable time. Additionally, we also provide novel analytical 
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differential entropy bounds for the one-dimensional GMM ease. To the best of our knowledge, 
this is the first time that an information theoretie eharaeterization has been used to evaluate 
human pereeption of audio quality by direetly measuring the brain aetivity of the subjeets using 
a high resolution EEG. 

The rest of the paper is organized as follows. In Seetion II we provide an overview of EEG, 
the ERP ehannel, the experiment, and the stimulus audio sequenees. In Seetion III we analyze 
the ERP ehannel using an information theoretie framework and ealeulate the MI over this end- 
to-end ehannel. The results are presented in Seetion IV, whieh is followed by a diseussion in 
Seetion V. We eonelude with a summary of our study and future direetions in Seetion VI. 

II. Background 


A. Electroencephalography (EEG) 

EEG is an observation of the eleetrieal aetivity in the brain reeorded over a period of time 
using eleetrodes plaeed on the sealp. As mentioned earlier, a single EEG eleetrode measures the 
sum eleetrie potential resulting from the synehronous aetivity of several hundred million neurons 
averaged over tissue masses. The large spatial seale of EEG reeordings make them unsuitable 
for studying the aetivity of neurons or small eell assemblies. Instead, EEG provides a robust 
measure of eortieal brain aetivity and is assoeiated with eognition and behavior. 

While EEG is affeeted by stimulus, it does not require one and oeeurs even in the absenee of 
external stimulus, for, e.g., brain aetivity reeorded during different sleeping stages. Event-related 
potentials (ERP) typieally represent time averaged EEG data reeorded speeifieally in response 
to a stimulus like light strobes, audio tones, ete. Eor this reason ERPs are assoeiated with state- 
dependent brain proeessing like seleetive-attention, task eontext, memory (reeognition), and other 


ehanges in mental state p7| |. 

Typieally, EEG studies attempt to tie speeifie eognitive proeesses to averaged data. While these 
extraeranial reeordings provide estimates of large-seale synaptie aetivity and are elosely related 
to the endogenous brain state, the most appropriate way in whieh to eombine the data in order 
to aehieve this goal is not elear. Common methods of analyzing and proeessing reeorded EEG 
inelude epoehing, averaging, time-frequeney analysis and linear diseriminant analysis, eorrelation 
and eoherenee rely on linear dependeneies and assume the EEG signal to be stationary. In 
eontrast, employing mutual information (MI) is well suited for modeling nonlinear, non-stationary 
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phenomena like EEGs as it is based on empirieally observed probability distributions and also 
performs an implieit averaging. 


B. ERP Channel 

Our goal here is to use MI to quantify the information transfer between the input audio 
stimulus and the reeorded EEG. In order to aehieve this goal, we model the overall transmission 
ehain eomprising of stimulus generation, proeessing by the human brain, and the EEG sensors 
as a eommunieation ehannel (Eig. [^a)). In the neuroseienee literature sueh a ehannel in general 
is referred to as the ERP ehannel [ |38l . 



Event 


ERP Channel 


EEG 


(a) Communicating over the ERP channel. 


Tx 


V 



(h) Single-input multiple-output channel (SIMO) channel. 

Fig. 1. The end-to-end perceptual processing channel is analogous to a SIMO communication channel. The input to the channel 
is the audio quality level and the output is the EEG response. 

In our case, the ERP channel is equivalent to a single-input multiple-output (SIMO) channel 
with unknown characteristics. A SIMO channel model is often used in wireless communications 
to represent a system with a single transmitting antenna and multiple receiving antennas as 
shown in Pig. [^b). In particular, for the ERP channel under consideration, the quality of the 
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audio stimulus represents the (single) input, and the observation at the EEG sensor electrodes 
on the scalp is the (multiple) output of this channel, respectively. 


C. Experiment 

To collect the data required for the study in our experiments, test subjects were presented with 
a variety of different audio test sequences whose qualities varied with time. The EEG activity 
of the subjects was recorded as they passively listened to these test sequences via a pair of high 
fidelity headphones (see Eig. [^. 


EEG Electrodes 


User Interface 



Fig. 2. The EEG experiment setup. 

Each trial consisted of a computer controlled presentation of an audio test sequence. At the 
end of the trial, the user interface prompted the subject to continue on to the next trial. A 128- 
channel ActiveTwo Biosemi EEG capture system monitored and recorded the brain activity of 
the subjects during the trial. The EEG system comprised of electrodes, connecting wires, and a 
DSP. The DSP re-referenced and amplified the weak signals picked up by the electrodes on the 
scalp and then converted the analog signal into a digital signal for storage. The DSP was also 
connected to the computer and used to synchronize the recordings at the start and end of each 
trial. 

The trials were conducted in a room specifically built for recording EEGs with a RE shielded 
testing chamber. A total of 28 test subjects, all with normal hearing capability, participated in 
the experiment with the majority of them being male. 
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D. Stimulus 

All the audio test sequenees used as stimulus were ereated from three fundamentally different 
base-sequenees. The base-sequences were sampled at 44.1kHz with a precision of 16 bits per 
sample. Two different types of distortions were implemented - scalar quantization and frequency 
band truncation - with each distortion type in itself having three quality levels of impairment 
“Ql”, “Q2”, and “Q3”, respectively (see Table |I]). Thirty second long test sequences were created 
from the base-sequence by selecting one of the two distortion types and by applying the three 
possible impairment levels in a time-varying pattern over the whole duration of the sequence. 
Fig. 1^ shows one of the many possible test sequences used in the study. The criterion and use of 
these test sequences and distortion quality levels as a metric for quality perception is described 
in detail in [ [39l . 

Note that despite the subjects were presented with all different quality levels in our listening 
tests, in our analysis we focus only on the high base-quality and the “Q3” degraded quality 
audio as inputs to the ERP channel. This addresses the worst-case quality change and keeps the 
problem analytically and numerically tractable. This choice is also supported from a subjective 
quality assessment viewpoint as it easier to distinguish between ‘good’ and ‘bad’ audio, as 
opposed to ‘medium-good’ or ‘medium-bad’ qualities. 

Typically, responses to a stimulus observed by EEG measurements are averaged over different 


trials to remove inconsistencies between these trials [40|. Evaluating the response to repeated 
trials provides a measure of discriminability on the given task, allowing to reliably judge which 
state changes. Multiple presentations of both the impaired test sequences and the unimpaired 
reference base sequences were made in a randomized fashion. Also, to remove any possible 
bias, all possible combinations of quality changes, distortion types, and base sequences were 
presented to each subject. 


E. EEG Data 

The EEG system captured data on each of the 128 spatial channels, sampled at 1024 Hz. An 
average re-referencing and baseline removal was performed to remove any DC noise component, 
which is often introduced over time into the EEG data usually as a result of steady brain activity, 
but also due to muscle tension, signal fluctuation, or other noise sources. Einally, the data was 
passed through a high-pass filter with a cut-off frequency at 1 Hz, and a transition bandwidth of 
0.2 Hz. 
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TABLE I 

Different quality levels presented to the subject during the course of an audio trial. To generate the 

DISTORTION, EACH OF THESE BASE-SEQUENCES WERE PASSED THROUGH A 2048-POINT MODIFIED DISCRETE COSINE 
TRANSFORM (MDCT) AND EITHER THE FREQUENCY TRUNCATION OR THE SCALAR QUANTIZATION WAS APPLIED TO THE 

COEFFICIENTS PRIOR TO RECONSTRUCTION | f^ . 


Quality Level 

Freq. Truncation 

Scalar Quantization 


Low Pass Filter 

No. of Significant Bits Retained 

Ql 

4.4 KHz 

4 

Q2 

2.2 KHz 

3 

Q3 

1.1 KHz 

2 


Buffer 


5s 


Buffer 



High base- 
qualitv 

Q3 

Q2 

Q1 

Q3 

High base- 
qualitv 



◄-► 

30s 


Fig. 3. A 30 second test sequence where the audio quality changes in a time-varying pattern over the whole duration of the 
sequence. Different possible combinations of quality changes and two distortion types were presented to each subject in a 
randomized fashion. 


We notice that for each trial we have a very large multidimensional data-set from which to 
attempt and extract signals corresponding to changes in the human perception of audio quality. 
To better manage the large amount of collected data while effectively mapping the activity across 
different regions of the brain, we suggest grouping the 128 electrodes of the EEG-system into 
specific regions of interest (ROI). The grouping scheme we use is shown in Eig. Q and is similar 


to the one used in [411. While a large number of potential grouping schemes are possible, this 
scheme is favored for our purposes as it efficiently covers all the cortical regions (lobes) of the 
brain with a relatively low number of ROIs. Eor example, regions 5 and 6 cover the temporal 
lobes, while region 7 spans the parietal lobe. Also, the nearly symmetrical distribution of the 
ROIs in this scheme allows for an easy mapping and comparison of EEG activity between 
different cortical areas. 
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Fig. 4. Schematic representation of the 128-channel EEG system. The electrodes are grouped into eight regions of interest 
(ROI) to effectively map the activity across different regions of the brain. 

III. Information Theoretic Analysis 

Notation: Throughout this paper, random variables are denoted by upperease letters X and 
their (deterministie) outeomes by the same letter in lowerease x. Probability distributions are 
denoted by p{x). A eontinuous random process is denoted using a time index Y(t). Multivariate 
random variables are denoted with an underlined uppercase letters F. Deterministic vectors are 
represented as smallcase underlined letters p, while matrices are denoted with boldface C. The 
expected value of a function is denoted as E[-]. 


A. ERP Channel 

In our discussion so far, we have shown that the ERP channel under consideration (see Fig. 
is analogous to a SIMO communication channel. The capacity of SIMO channels has been 


previously analyzed in [42|. However, since determining the capacity achieving input distribution 
for the ERP channel is virtually impossible to obtain in an experimental setup, we focus on 
analyzing the MI for a uniform input distribution (symmetric capacity) as outlined below. Further, 
since the ERP channel is non-linear, non-stationary, potentially time varying, and does not have an 


explicitly defined channel characteristic, the expressions stated in [[42| for the direct computation 
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of its MI are not applicable. We begin our analysis therefore by taking a closer look at the only 
known parameters of our model, viz. the input and the output random variables. 

The input random variable X of the ERP channel is uniformly distributed over a set of class 
indices X which describe the quality of the stimulus sequences at any given time interval. As 


discussed in Section II-C we restrict ourselves to \X\ =2. Then, the audio quality of the input 
sequence can be represented as an equiprobable Bernoulli distribution 


{ Xi, if the input stimulus is of high quality, 

X 2 , if the input stimulus is of degraded quality level ‘Q3’, 

with probabilities 


p{xi) = p{x 2 ) = 1/2. 


( 1 ) 


The output of the channel is given by the set C M” containing all possible values of the 
EEG potential at any given time interval. Eor a total of n ROIs we therefore get a (multivariate) 
output vector of random processes Y_{t) = {Yi{t),... ,Yn{t)). To reduce the variance of the 
ERP channel measurements we consider every electrode in the i-th ROI as an independent 
realization of the same random process i = 1,... ,n. This is supported by our grouping 
scheme wherein all electrodes of a ROI are located within close proximity of one another and 
capture data over the same cortical region. Eurther, for tractability we assume the random process 
to be stationary within the five second non-overlapping blocks of a trial with constant audio¬ 
quality. The probability function py(-) is therefore not dependent on time and can be written as 
Py_ {vit)) = PY_{y) without any loss of generality. Note that this assumption does not rule out 
any non-stationary behavior between sections of different audio quality within the same trial. 

The relationship between the input and output of the ERP channel is characterized by its 
conditional probability distribution. Since the input can take on two distinct values, there are 
two marginal conditional distributions p{yi\xi) and p{yi\x 2 ) corresponding to any given ROI. 
Pig. shows the marginal conditional distributions obtained via histogram measurements of a 
single ROI output over time. A detailed inspection using different subjects, input sequences, 
and ROIs allows us to assert two important facts about this distribution. First, the conditional 
distribution converges to a Gaussian with zero mean. The potential recorded at the EEG electrode 
at any given time-instant can be considered as the superposition of responses of a large number of 
neurons. Thus, the distribution of a sufficiently high number of these trials taken at different time 
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Fig. 5. Marginal conditional distributions p(yi\xi) and p(yi\x 2 ) of subject S4 over a single ROI, for high quality and frequency 
distorted audio input-stimulus, respectively. The Gaussian fit is obtained by using an estimator that minimizes the L\ distance 
between the fitted Gaussian distribution and the histogram data. 


instances converges to a Gaussian distribution as a result of the Central Limit Theorem (CLT). It 
then follows directly from the CLT that the probability distribution for n ROIs will also converge 
to a n-dimensional multivariate Gaussian distribution. Second, we observe from Fig. that there 
is a difference between the variances of the distributions p{yi\xi) and p{yi\x 2 ). This indicates 
that Yi is dependent on X, i.e., the EEC data is related to and contains “information” about 
the input stimulus, which is characterized by the MI I(X;Yi) as a measure of the information 
transfer over the ERP channel. 

B. Entropy and Mutual Information 

The entropjQ of a (continuous or discrete) random variable X with a probability distribution 
p{x) quantifies the amount of information in X, and is defined as 

h{X) =E[—logp{x)]. (2) 


1 


In the following, we simply use the denomination “entropy” even when referring to differential entropy for the case of 


continuous random variables. 
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Unless otherwise speeified we will take all logarithms to base 2, thus measuring the entropy and 
MI in bits. For two random variables X and Y with a joint probability density funetion p(x, y) 


the MI between them is defined as 

I{X-Y)=¥.xy 


log 


p{x,y) 

p{x)p{y) 


(3) 


In this work we are interested in the MI between the ERP ehannel input X and output F. 
Therefore, a moderate-to-high value for I{X;Y_) indieates that the EEG data output is related 
to the input stimulus, i.e., eontains information about the audio quality. 

If we eonsider the eonditional probability distributions eorresponding to n ROIs ehosen 
simultaneously, then eaeh of the distributions p{y\x = 0) and p{y\x = 1) are n-dimensional 
Gaussian distributions, with Ci and C 2 being the n x n eovarianee matriees of eaeh of the 
distributions respeetively. Eig. shows the eovarianee matriees of a subjeet over n = 8 regions 
for a single trial. The eonditional entropy h{Y_\X) is then given by 

00 

hiY\X) = -'^pi^x) p{y\x) logp{y\x)dy 

-6^ io 

= |{log(27re)’"|Ci|+log(27re)”|C2|}, (4) 


where we have used the elosed-form expressions for the differential entropy of a multivariate 


Gaussian distribution [12, Theorem 8.4.1]. Using the law of total probability we ean write p{y) 
in terms of the eonditional probabilities as 


p{y) = ^p{Y=y\x)p{x) 


xex 


= 2 [pivl^i) + piy\^2)} ■ 


(5) 


As both p{y\xi) and p{y\x 2 ) are Gaussian distributed, the resulting random veetor y is a 
multivariate Gaussian mixture with a distribution p{y) given in Q. Therefore, the entropy of Y_ 
is given by 


Mr )=-2 



■log ( - [p{y\xi)+p{y\x 2 )] ] dy. 


( 6 ) 


The MI between the output EEG data and the input audio stimulus ean then be ealeulated using 


I{X-Y) = h{Y)-h{Y\X)- 


( 7 ) 
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(a) (b) 

Fig. 6. The covariance matrices corresponding to the mulitvariate Gaussian conditional distributions (a). p{y\xi) and (b). p{y\x 2 ) 
of subject S4 over n=8 ROIs, indicating the high correlation between the different regions. 


C. Entropy approximation 

The entropy of the output random variable Q is the entropy of a mixture of two Gaussian 
distributions. It turns out, to the best of our knowledge, that there is no closed form solution 
for the entropy of a Gaussian mixture model (GMM) which is also a recurring open problem 
in the literature. The expression for the entropy of a GMM consists of a multiple integral over 
a logarithm of a sum of exponential functions which makes it difficult to formulate a general 
closed-form analytical solution. In the absence of an analytical solution it is usually common 
to use a numerical method to calculate a sufficiently accurate estimate of the solution. How¬ 
ever, as the dimensionality of the Gaussian multivariate random variable increases, it becomes 
computationally infeasible to evaluate the n-order multiple integral using numerical integration. 
This drawback becomes even more apparent in our case where several test subjects each with 
multiple test sequences must be processed. In keeping with our primary goal of evaluating the 
overall information transfer over the ERP channel, we take a closer look at this open problem 
in the following subsections and propose two novel low-complexity methods to estimate the 
differential entropy of a GMM. 

1) Single region entropy bounds: In the following we derive new, tight upper and lower 
bounds for the entropy of a single ROT The critical factor in the entropy calculation of the 
Gaussian mixture is the logarithm of the sum of individual distributions. By employing the 
Jacobian logarithm we obtain an alternate expression for the logarithm of a sum 


ln(e“ + e^) = max (a, h) + In (1 + , 


( 8 ) 
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where a and b are eonstants. The expression in ([^ ean be bounded as 

max (a, b) < ln(e“ + e^) < max (a, 6) + In 2 , 

where equality on the upper bound is aehieved when a = b. Substituting 

1 —y'^ 

e ^ — exp 

(Ji 2a( 

and —)■ — exp 

(T2 


( 9 ) 


2^2 ’ 


( 10 ) 


we obtain the following bounds 


max 


y 


2al 


+ In cTi , 


y 


2ai 


+ In (72 ) < In 


CTl 


< max 


y 


2af 


+ In cTi , 


+ 


y 


0-2 


2ai 


+ 111(72 )+In ( 1 + ^ ) . (11) 


Here af, are the varianees of the two Gaussian distributions piyjxi) and p(ylx 2 ), respeetively, 
ehosen sueh that (72 > (7i. We ean now use the lower bound in ( [TT] ) to ealeulate the upper bound 
for the maximum entropy h(V) for a single ROI, 

oo 2 2 

h(V) <-^ j [p(i/|a;i)+p(|/|a; 2 )] ■ max + In2y^c7i , ^ + \n2V^a'^dy 

— OO 

A ^ 

= J[p{y\xi) + p{y\x 2 )] ■ + In27^(71^ dy 


[p{y\xi)+p{y\x 2 )\- \^ + \n 2 V^(T 2 ] dy, (12) 


2a, 




where ±A are the points of interseetion of the two eomponent Gaussian densities of the GMM. 
The points of interseetion ean be found by equating the two Gaussian densities and solving, 


{1 /= (1/\/^a2)e^ 

afal ■ (Inaf - lna|) 


A = 




(13) 


The two integrals in (12) have a elosed form solution and ean be solved using the expressions 
for truneated normal distributions as 


h(V) < ^ f-—Q + ~ ) + ln(2A/^ai) ■ 
2V ai V(7i/ Cl 


1 1 ■ 

-1- 

Cl C2 




A 


+ 7 ^ (-Q ( — ] “I- 

2(7f V cr2 \0-2y C2 


A 
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2ai 


CTl 


1 1 ■ 

di (^2 


1 

+ 2 


■— Q [ — ) + ;r 

(72 \a2j Ct2 


(14) 


where 


1 


Cl = 


di = 


$(A)-$(0) 




C2 = 


do — 


1 




(15) 


and <!)(■) is the standard normal eumulative density funetion, and Q(-) is the standard normal 
probability density funetion. A similar approaeh ean be used to ealeulate a lower bound on the 
minimum entropy of h(Y) by using the upper bound in ( [TT] ). 

This method is sueeessful in providing tight bounds for the the entropy of a mixture of 
two sealar normal random variables, i.e., the entropy for an individual ROI. A limitation of 
this method however is that it is not effeetive for evaluating multivariate normal distributions 
and is therefore unsuitable for evaluating the entropy of multiple ROIs ehosen simultaneously. 
In partieular, a mixture of a n-dimensional normal distribution will have interseetions in a n- 
dimensional spaee and the integral in ( [T^ , if it even exists, will be extremely eomplex. 

2) Multiple region entropy approximation: In order to address these shorteomings for higher 
dimensional EEG outputs Y_ we propose an approximation for the entropy by performing a 


eomponent-wise Taylor series expansion of the logarithmie funetion [^. This approximation 
makes no prior assumptions and is suitable in general for estimating the entropy of any given 
multidimensional GMM. 

Eet Pi{z) ~ Af{z-, /i., Cj) be the probability distribution for the Ath eomponent of the GMM 
assoeiated with the n-dimensional Gaussian distribution Af{z; Cj) with mean p. G MA and 
eovarianee matrix C* G The probability distribution of the GMM is then given by 

L 

pU) = ^WiPi{z), (16) 


2 = 1 


where L is the number of mixture eomponents, and Wi denotes the weight of the Ath eomponent 
of the GMM, respeetively. Also, let 


f(z) = logp(z) = log I '^WjPji: 

Vi=i 

If we then use the Taylor series to expand the funetion f{z) around /i , we obtain 


(17) 


f'{p.) 

m = f{p^ + -^{z-p^) + 


2 ! 


(^ - PA 


+ ... 


(18) 
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The odd central moments of a Gaussian distribution are zero [ |44| , [ |45| . Therefore, all the odd 
order terms in the Taylor series expansion of the entropy of the Gaussian mixture are also zero, 
resulting in the following expansion 


h{z) = — p{z) logp(z) dz = — p{z)f{z) dz 


I WiPi{z) ■ <{ /(^.) + - pf + ...}dz. 


i —1T 


2 ! 


(19) 


The Gaussian mixture is a smooth function and the derivatives of the function f{z) will therefore 
always exist. The Taylor series approximation is exact only if an infinite order of terms are 
considered, and the accuracy of the approximation therefore depends on the number of expansion 
terms used. We provide closed-form expressions for the zeroth-order, second-order, and fourth 
order terms of the Taylor series expansion in the Appendix at the end of this paper. 


The Taylor series in ( [181 ) is expanded around the mean of the distribution. As the variance 
of the distribution increases, the data set is spread further away from the mean and as a result 
a larger number of higher-order terms are required to reduce the approximation error. Also, if 
one of the distributions has a relatively high variance compared to the other distributions in the 
mixture, then the corresponding n-dimensional pdf is skewed heavily along that particular axis. 

Calculating a large number of higher order terms is complex and computationally demanding. 
In order to obtain a high accuracy approximation while keeping the number of expansion terms 
constant, we propose using a variance splitting approach [|43|, [|46|. In this approach we split 


and replace the high variance Gaussian component with a mixture of Gaussians, each with 
a substantially lower variance than the original. The i-\h component with i = 1, 2,..., L for 
splitting is identified and the corresponding covariance matrix Cj is diagonalized as 


CiYji , 


( 20 ) 


such that 


D, = diag(A®,A®,...,A<'> 


(21) 
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is the diagonal matrix containing the eigenvalues of the matrix C*, and Sj is the diagonalization 
matrix whose columns are the eigenvectors of Cj. The z-th mixture component is then split along 
the principal axis of its longest ellipsoid into K sub-components as 

K 

Wi ■ Pi{z) = ■ Wi ■ p'f’ {z) , (22) 

k=l 

where Wk,Pk, and Pk are splitting parameters. Further, we define 

K 


(i) A - 
Wl = WkWi 

with 

(*) 

k=l 


l^k - + \ 



= diag ( 

If.... 

\(d ^2 Pi) 

1 T' ki ^d+1) • • • ) '^n J ) 

(23) 

where is the largest eigenvalue 

of Ci. 

The new covariance matrices, 

, for the sub- 

components are obtained by rotating 

back 




Cf = E.D<->Sf, 


(24) 


defining the distributions ~ The parameters Wk, Pk and ak are directly 






estimated using the splitting library shown in Table 1. The library was generated for the standard 


Gaussian density (zero mean, unit variance) using the splitting technique used in [461, [47 j. The 
chosen Gaussian component is first split into two mixture components starting with an initial 
guess 


w exp < — 


1 (x — pY 


Wi exp < — 


1 (x - pif 
d? 


+ '«;2exp <; -- 


1 (X - P2f 


(To 


(25) 


where 


pi = p-e, p2=p + e, (Ji = (72 = cr, Wi = W 2 = w/2, 


(26) 


and e = 0.001 is a small constant. A least-squares error minimization is then performed wherein 
the standard deviation of the split components are recursively reduced while adapting the means 
and weights accordingly. When no further optimization is possible, the resulting two mixture 


densities can again be split according to (25) yielding four mixture components. The results of 
using this library for both a two-way and a four-way-split of the standard Gaussian are depicted 
in Fig. 1^ While maintaining a high degree of accuracy, this split is not perfect and introduces 
a marginal amount of error. In theory, it would require an infinite sum of Gaussian distributions 


on the r.h.s. of ([25]) to exactly represent the single Gaussian distribution on the l.h.s. of (|25|). 
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TABLE II 

Splitting library for the standard Gaussian density with zero mean and unit variance | [4^ , | |47] |. 



k 

Wk 

^ k 


Two Way Split (M 

= 2) 





1 

0.5000 

0.7745125 

-0.56520 


2 

0.5000 

0.7745125 

0.56520 

Four Way Split (M 

= 4) 





1 

0.127380 

0.5175126 

-1.41312 


2 

0.372619 

0.5175126 

-0.44973 


3 

0.372619 

0.5175126 

0.44973 


4 

0.127380 

0.5175126 

1.41312 



Z 


Fig. 7. Variance split of a standard Gaussian using the splitting library in Table [11] 


Fig. exemplarily shows the simulation results for the entropy approximation of a sample 
GMM with n = 2 as its varianee is inereased, ealeulated by using a fourth-order Taylor series 
expansion, with and without the varianee split. The results indieate an inerease in the overall 
accuraey of the final entropy estimate when using the varianee split. We have observed that the 
splitting approaeh is espeeially helpful at higher varianees and, if required, ean be performed 
repeatedly to further refine the approximation. 
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Fig. 8. Entropy approximation using a fourth-order Taylor expansion for a sample synthetic GMM with zero mean vectors, equal 
weights, and Ci = [4, 2; 2,4], C 2 = [5 +p, 2; 2, 5 +p], where the variance of the second distribution is incremented using the 
parameter p. The true entropy was calculated using numerical integration which in this case for n = 2 is still computationally 
tractable. 


IV. Mutual Information Results 

The MI can be trivially upper bounded as I{X;Y_) < H(X) = 1 bit, where we have used 
the fact that X is drawn from an equiprobable Bernoulli distribution. This upper bound depends 
only on the quality of the audio stimulus, independently of the subject, the audio sequence, the 
pre-processing of the EEG data, or the number of ROIs considered. Therefore, the maximum 
information that can be transferred over the ERP channel for the given input is 1 bit. 


A. Single region 

We first evaluate the MI over a single region. Choosing only one ROI at a time corresponds 


to Y_ being univariate, i.e., n = 1. Table III lists the MI results calculated for each ROI for one 
test subject. Also shown are the lower and upper entropy bounds obtained using the Jacobian 


Eog bound introduced in Section III-Cl 


The output of the ERP channel maps the activity spread over the entire cortex distributed over 
all 8 ROIs. Considering only a single region then ignores the fact that the ROIs are coupled 
due to the shift of neural activity between the regions. Therefore, the available information is 
severely limited when only a single ROI is considered. It is interesting to note however, that the 
temporal regions 5 and 7 show a higher MI than the other regions. This is in strong agreement 
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TABLE III 

Single region entropy bounds and mutual information for subject S4. 


ROI 

Entropy 

Lower Bound 

Upper Bound 

MI 

1 

3.2512 

2.9676 

3.5549 

0.0464 

2 

3.4257 

3.1781 

3.7633 

0.0484 

3 

3.2650 

3.0493 

3.6128 

0.0727 

4 

3.0395 

2.7990 

3.3828 

0.0498 

5 

3.3441 

3.1937 

3.7105 

0.1481 

6 

3.1016 

2.8841 

3.4354 

0.0427 

7 

3.3437 

3.1089 

3.6840 

0.0890 

8 

3.5481 

3.2253 

3.8623 

0.0121 


with the fact that the the primary auditory cortex which is involved with the perception and 
processing of auditory information, is located in the temporal lobe. 


B. Multiple regions 

To calculate the entropy over multiple regions we use the fourth-order Taylor series approxima¬ 


tion presented in Section III-C2 A four-component variance split is performed twice to further 
refine the approximation result. Fig. shows the final MI estimates of a single subject for 
each trial using different combinations of the base music sequences and time-varying distortion 
patterns. Also, calculated and shown in the same figure is the median MI across all trials for 
each of the two distortion types. The median is chosen here to remove any outliers among the 
trials. We observe that the MI over the ERP channel for a given trial is in general moderate 
to high. Further, as one would expect, there is a significant increase in the MI calculated over 
multiple ROIs, when compared to the single ROI case. 

The median individual MI estimates for different distortions and each subject are summarized 


in Fig. 10(a) We notice that in general for a given subject there is only a minor variation in 


the median MI between the different distortion types. Further, Fig. 10(b) shows the median MI 


of each subject calculated across trials for different base music sequences used. By comparing 
Fig. 1 10(a) and Fig. 10(b) we observe a similar trend, and subjects that have a high MI in one 
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X Freq. Truncation (FT) 

O Scalar Quantization (SQ) 


6 8 10 

Trial Number 


12 14 


16 


Median FT 
Median SQ 


Fig. 9. MI estimates for subject S4 for the two considered distortion types, conducted using different combinations of base 
sequences, and time-varying distortion patterns. 


experiment also have a high MI in the other. Overall, the MI is found to be eonsistent aeross 
the entire pool of test subjeets aeross different trials, distortion-types, and musie-types. 


C. Confidence intervals for the MI using bootstrapping 


The bootstrap [ [48| |, [ |49| is a powerful resampling teehnique for assessing the aeeuraey of a 
parameter estimate (in this ease the median). It is espeeially useful in making asymptotieally- 
eonsistent statistieal inferenees when only a limited number of sample realizations are available. 
We derive eonfidenee intervals (Cl) for the median of the MI estimates using the bootstrap-? 


pereentile method [50|. It has been shown [511 that the bootstrap-? pereentile intervals have a 
higher order aeeuraey as eompared to regular bootstrapping by better aeeounting for the skewness 
and other errors in the original population. 

The original sample set eonsists of the aetual MI estimates ealeulated for eaeh of the trials. 
The median of this sample is denoted by M. Bootstrap resamples are then ereated by drawing 
samples, with replaeement, from a population of this original sample set. The median for eaeh of 
this resampled set is ealeulated and denoted as Ml with b = 1,2,... ,B. Generally, B is a large 
number in the order of several hundred. For our ealeulations here we use B = 1000 resamples. 












Mutual Information (bits) 
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(Max Ml) 1 


0.8 

0.7189 

0.6581 

0.6 




Scalar Quantization 
I 1 Freq. Truncation 


■ Upper Cl 

■ Lower Cl 


13 14 21 27 15 20 16 4 5 8 3 


23 12 10 11 25 17 24 26 22 28 

Subject Number 


7 19 9 


(a) Different distortion types 


Max (Mi) lr 


^ 0 . 7189 - 
5 0.6581 - 
I 0.6- 

<5 

E 

o 


I Classical 
^■Pop 
I I Rock 


13 20 15 27 21 14 23 12 8 


4 16 3 10 11 17 25 24 5 22 28 18 6 26 2 

Subject Number 


1 9 7 19 


Upper Cl 
Lower Cl 


(b) Different music types 


Fig. 10. Ordered median MI estimates for each of the 28 test subjects when presented with the same set of trial- 
sequences using a combination of different distortion types, patterns, and music sequences. Confidence intervals 
(Cl) for the median are obtained using bootstrapping ||48), |49|. 
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TABLE IV 

MI CONFIDENCE INTERVALS ON THE ERP CHANNEL IN BITS, OBTAINED USING BOOTSTRAPPING. 


Type 

Lower Cl 

Upper Cl 

Distortion 

Freq. Truncation 

0.6162 

0.6937 

Scalar Quant. 

0.6625 

0.7087 

Music 

Rock 

0.6594 

0.7286 

Pop 

0.6352 

0.7321 

Classic 

0.6509 

0.7101 

Overall 

0.6581 

0.7189 


The l-parameter is defined by normalizing the differenee between the original sample statistics 
(in this case the median) and the resampled statistics over the standard error 

M -Ml 


f — 
I'h — 


a'' 


(27) 


where cx*^ is the sample variance and s is the number of elements in each resampled set. 
The empirical standard deviation a* is itself calculated by performing a nested bootstrap from 
resamples of M^. 

We therefore get a total of B values of the t-parameter (f^,..., , fg), from which we 

calculate the required percentile cutoffs of our CIs. For our analysis, we take the 2.5th and 97.5th 
percentile cutoffs and calculate the lower and upper Cl, respectively, as 


CIlow — M — t 25 (J*, 
Clhigh = M — tgj^a*. 


(28) 


Table IV lists the confidence intervals for different distortion and music types. Both upper and 


lower CIs are also included in Fig. [TO 


V. Discussion 


MI quantifies the amount of information the recorded EEC contains about the quality of the 
input audio-stimulus. By measuring how aligned the brain activity is in response to the stimulus 
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we thereby directly quantify the subject’s perception with respect to the change in audio quality 
with time. A high MI therefore indicates a large correlation between the actual and the perceived 


audio quality. For example, the fact that subjects 13 and 27 have a high median MI in Fig. 10 


implies that these subjects have a strong perception towards quality change irrespective of the 
music or distortion type used in the test sequence. 

However, brain dynamics are fundamentally multiscale in nature and it is often difficult to draw 
a one-to-one relation between the neural electrical field dynamics and the cognitive/behavioral 


state [52|. The EEG activity is dependent on the psychological state of the subject including 
concentration, motivation, and fatigue. As a consequence, the recorded EEG activity varies 
considerably not only over the duration of a trial, but also from one trial to another, which 
could result in low MI, for example as it is the case with subjects 7, 9, and 19. 

It is also important to note the significance of choosing the ROIs. The output EEG activity 
recorded over the ERP channel is dependent on the number of regions, the size (i.e., the number 
of individual electrodes within each ROI), and the location of the ROIs on the scalp. Increasing 
the number of regions increases complexity whereas considering too few oversimplifies the 
problem, as demonstrated by the single ROI results in Table IITO In our work we select the 


regions based on the cortical lobes (see Section II-E). The ROI selection could possibly be 
improved by identifying the distribution of virtual sources inside the brain which generate the 
EEG activity on the scalp. A source localization of the EEG signal is commonly referred to 
as the inverse problem and while several solutions have been suggested to varying degrees of 


accuracy [53| it continues to be an undetermined problem [52|. 


VI. Conclusion 

In this work we have presented a new framework for audio quality assessment based on 
observed EEG data for subjects listening to time-varying distorted audio. By modeling the end- 
to-end perceptual processing chain as a discrete-input time-varying nonlinear SIMO channel with 
memory, we are able to quantify how the subjective perception changes if the observed audio 
quality changes with time. We have focused on the simplest binary-input SIMO case where 
the audio quality can only change between distorted and undistorted levels. The MI results 
demonstrate the practical applicability of this approach and provide a performance measure for 
the information transfer over the ERP channel, which can be computed via a new low complexity 
algorithm. This algorithm is based on a fast approximation technique for the differential entropy 
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of a multidimensional GMM. The proposed approach can be extended to video quality perception 
and to other neuroimaging assessment techniques like MEG, albeit with a higher complexity. As 
potential future work the proposed information theoretic framework can be exploited to develop 
an inference model for blind audio quality classification based on perception. 


Appendix 


In this appendix we derive the Taylor series expansion of the entropy up to its fourth order 
component. The Taylor series expansion of the entropy for the GMM can be written as 


h{z) = -^JWiPiU) ■ i f{p.) + 

* = lgn f 

= /Iq T ^2 T ^4 T R-: 


2 ! 


■{z — /i.)^ + .. .> dz 


(29) 


where ho, h 2 , are the zeroth, second and fourth order expansion terms respectively, and R is 
the residual sum term representing all the remaining higher order terms. 


A. Zeroth-order term 

The zeroth-order Taylor series expansion term ho is given by 

ho = jWiPi{z)f{ix.) dz 

^ — 1 iiiarj 


^Wi \ogp{[M 


(30) 


2 = 1 


which follows from the fact that the integral over the entire probability distribution is 1. 


B. Second-order term 

The second-order expansion term requires us to calculate the gradient gj and Hessian Hj of 
the Gaussian probability distribution pj(^) ~ A/’(^; p., C*), which are respectively defined as 

gi= Vpi{z) = Pi{z)C-\p.- z), (31) 

(VV^)p,(^) = ^-p^{z)C-\ (32) 

Pi{z) 

where V denotes the vector differential operator and i = 1,2,... ,L. Accordingly, the gradient 
and Hessian Hj for the log-density function f{z) can be defined as 

5i = V/(^) = Vlogpj(^) = —(33) 

PiU) 
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H.4(VV-)/(„ = _^g,gr + ^H.. (34) 

We can then calculate the second-order term as 

/i 2 = - ^ fwiPiiz) ■ ■ {z - p.){z - p.)'^ dz 

7 — 1 ^ 

-L]^n 

1 ^ 

= . (35) 

Z ' ^ z=u 

i=l - 

The symbol o denotes the Frobenius product A o B = ^ ^ Ojj ■ bij. 

i 3 

C. Fourth order term 

The third order partial derivative with respect to pi(z), i = 1, 2,..., L, is given by 

T. 4 H,0 (c-'(2 - a)) - fe®C-‘ - #C-‘gf, (36) 

where A®B is the Kronecker product of the matrices A, B, and 7 )^ is the matrix vectorization 
operator. The vectorization of a matrix is the linear transformation of converting a mxn matrix 
into a mn x 1 column vector by stacking the columns of the matrix on top of each other. Further, 
partial differentiating the Hessian twice yields the fourth order derivative with respect to Pi{z), 

F, ^ (VV^)H, 

= H,®C-^ + Cri(z-^.)®Tf + Cri®H, + C-^®Hf. (37) 

Similarly then, the third and fourth order partial differential matrix for the log-density f{z) are 
respectively defined as 

Ti = —^ 

Pi{zy 

1 

-{HiOgi -f giOHi} 

PiUr 

^ S5) 

-^ {HiO&gf + gi®Hi(g)gf } 

+ {2gi®Tf - (#C-^gf)®gf} 

+ ^ {F, - 2H,®C-i - T,®gf + } . 

Pi{z) 


( 39 ) 
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The fourth order term is then given by 


hi 


1 — 1 

TTDrj 


U)^i —dz 


— ^Wi-¥io{Ci®C] 


2=1 


Z=ll . 


(40) 
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